Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech

نویسندگان

  • Claude Barras
  • Edouard Geoffrois
  • Zhibiao Wu
  • Mark Liberman
چکیده

This paper describes the first version of “Transcriber”, a tool for segmenting, labeling and transcribing speech. It is developed under Unix in the Tcl/Tk script language with extensions in C, and is available as free software. The environment offers the basic functions necessary for segmenting, labeling and transcribing long duration signals. The signal editor and the text editor are integrated and synchronized in order to display and play the current segment. The output is in a standard SGML format. Multiple languages are supported. The tool can be ported to various platforms and is very flexible so that new functions can be easily added. We hope that such a portable, widely available and flexible tool will benefit the whole community and make it easier to develop and share corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transcribing with Annotation Graphs

Transcriber is a tool for manual annotation of large speech files. It was originally designed for the broadcast news transcription task. The annotation file format was derived from previous formats used for this task, and many related features were hard-coded. In this paper we present a generalization of the tool based on the annotation graph formalism, and on a more modular design. This will a...

متن کامل

Labeler agreement in transcribing korean intonation with K-toBI

This paper reports labeler agreement in the transcription of Korean prosody using Korean ToBI (K-ToBI) [9]. Twenty utterances representing five different types of speech were produced by 18 speakers and transcribed by 21 labelers differing in their levels of experience with K-ToBI. Following the stringent metric used for English ToBI evaluation [14,12], consistency was measured in terms of the ...

متن کامل

Transcriber: Development and use of a tool for assisting speech corpora production

We present ``Transcriber'', a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the scripting language Tcl/Tk with exten...

متن کامل

Transcribing continuous speech using mismatched crowdsourcing

Mismatched crowdsourcing derives speech transcriptions using crowd workers unfamiliar with the language being spoken. This approach has been demonstrated for isolated word transcription tasks, but never yet for continuous speech. In this work, we demonstrate mismatched crowdsourcing of continuous speech with a word error rate of under 45% in a large-vocabulary transcription task of short speech...

متن کامل

A Computer Assisted Speech Transcription System

Current automatic speech transcription systems can achieve a high accuracy although they still make mistakes. In some scenarios, high quality transcriptions are needed and, therefore, fully automatic systems are not suitable for them. These high accuracy tasks require a human transcriber. However, we consider that automatic techniques could improve the transcriber’s efficiency. With this idea w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998